documentation: http://pandas.pydata.org/pandas-docs/stable/visualization.html
In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
The plot method on Series and DataFrame is just a simple wrapper around plt.plot()
If the index consists of dates, it calls gcf().autofmt_xdate() to try to format the x-axis nicely as show in the plot window.
In [ ]:
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
ts.plot()
plt.show()
On DataFrame, plot() is a convenience to plot all of the columns, and include a legend within the plot.
In [ ]:
df = pd.DataFrame(np.random.randn(1000, 4), index=pd.date_range('1/1/2016', periods=1000), columns=list('ABCD'))
df = df.cumsum()
plt.figure()
df.plot()
plt.show()
You can plot one column versus another using the x and y keywords in plot():
In [ ]:
df3 = pd.DataFrame(np.random.randn(1000, 2), columns=['B', 'C']).cumsum()
df3['A'] = pd.Series(list(range(len(df))))
df3.plot(x='A', y='B')
plt.show()
In [ ]:
df3.tail()
Plotting methods allow for a handful of plot styles other than the default Line plot. These methods can be provided as the kind keyword argument to plot(). These include:
For example, a bar plot can be created the following way:
In [ ]:
plt.figure()
df.ix[5].plot(kind='bar')
plt.axhline(0, color='k')
plt.show()
In [ ]:
df.ix[5]
In [ ]:
df2 = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df2.plot.bar(stacked=True)
plt.show()
In [ ]:
df2.plot.barh(stacked=True)
plt.show()
In [ ]:
df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
df.plot.box()
plt.show()
In [ ]:
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.plot.area()
plt.show()
Pandas tries to be pragmatic about plotting DataFrames or Series that contain missing data. Missing values are dropped, left out, or filled depending on the plot type.
Plot Type | NaN Handling | |
---|---|---|
Line | Leave gaps at NaNs | |
Line (stacked) | Fill 0’s | |
Bar | Fill 0’s | |
Scatter | Drop NaNs | |
Histogram | Drop NaNs (column-wise) | |
Box | Drop NaNs (column-wise) | |
Area | Fill 0’s | |
KDE | Drop NaNs (column-wise) | |
Hexbin | Drop NaNs | |
Pie | Fill 0’s |
If any of these defaults are not what you want, or if you want to be explicit about how missing values are handled, consider using fillna() or dropna() before plotting.
In [ ]:
ser = pd.Series(np.random.randn(1000))
ser.plot.kde()
plt.show()
In [ ]:
from pandas.tools.plotting import lag_plot
plt.figure()
data = pd.Series(0.1 * np.random.rand(1000) + 0.9 * np.sin(np.linspace(-99 * np.pi, 99 * np.pi, num=1000)))
lag_plot(data)
plt.show()
documentation: http://matplotlib.org/gallery.html